Voice source analysis for pitch-scale modification of speech signals
نویسندگان
چکیده
Much research has shown that the voice source has strong influence on the quality of speech processing [4][5][6]. But in most of the existing speech modification algorithms, the effect of the voice source variation is neglected. This work explains why the existing modification scheme can’t truly reflect the voice source variation during pitch modification. We use synthesized voiced speech sound to compare an existing pitch modification scheme with our proposed voice source scaling based modification scheme. Results show that voice source scaling based pitch modification can be used for wider range pitch modification. Key word: speech pitch modification, voice source, formant synthesis. 1. Analysis of the voice source effects in speech modification 1.1. Pitch modification framework Speech modification plays an important role in many aspects of speech processing, for example: text-to-speech synthesis, speech recognition, speaker recognition, speech conversion etc. Much research has been done in time-scale/pitch-scale modification. Efficient speech synthesis and modification methods like Pitch Synchronized OverLap Add (PSOLA) are widely used in many systems [1]. Recently other speech modification models such as sinusoid model [2], or the harmonic plus noise model (HNS) [3] have also been presented. All of these methods based on speech production source-filter model. The source-filter model consists of a source that generates a sequence of glottal pulses, act as the input to a filter that models the vocal tract system, and a differentiation operator that models the radiation at the lips, it can be expressed as a convolution as in Eq.1: ) ( ) ( ) ( ) ( t r t v t g t s ∗ ∗ = (1) in which g(t) is the excitation signal to the vocal tract, it corresponding to the glottal air flow that is injected into vocal tract, v(t) is vocal tract transfer function and r(t) is radiation at the mouth. In most of the speech modification algorithm, the radiation part and the vocal tract part are interchanged so the input to the vocal tract transfer function is a differentiated voice source waveform. Pitch modification normally includes the following steps: 1. Apply a window on continuous speech signal to get a short time framed signal. 2. Perform a source-filter decomposition of the framed signal to get the source signal, for voiced speech, the voice source would be a series of impulses and have a flat spectrum. 3. Modify the voice source, result in a new impulse train spaced at required pitch period. 4. Apply the vocal tract filter on the modified voice source, the output is the desired pitch-scaled signal. During the modification, the vocal tract transfer function v(t) remains unchanged, the input to the vocal tract system is modified to a new excitation signal, and then the excitation signal is convolved with vocal tract transfer function to get the modified signal. In this manner, the overall contour of the speech in the frequency domain is unchanged, and pitch can be modified independently, e.g. during the pitch changing, the speech duration remains unchanged. Because the modified speech has the same spectral envelope as the original speech, it retains the intelligibility of the original speech. 1.2. Analysis of pitch modification by impulse train scaling The speech modification methods we mentioned above have achieved high efficiency, but are short of naturalness and sometimes cause distortion in modified speech. An important aspect missed in them is the voice source effect under different conditions. During modification, the vocal tract transfer function is obtained by estimating the linear prediction coefficients a(i) over the whole frame, which is about 2-4 pitch period’s duration. Then the residual signal is extracted from the speech waveform by the inverse filtering as voice source is represented in Eq. 2:
منابع مشابه
Production Based Pitch Modifica
Previous research has shown that the voice source is strongly correlated with speech quality [1][2][3]. However in many existing pitch modification algorithms only the impulse train excitation is modified, while the voice source is normally included in vocal tract transfer function and remains unchanged during modification. We present a production based pitch-scale modification scheme, which mo...
متن کاملVLSI implementation of a TSM/FSM algorithm
The time scale modification (TSM) of speech is concerned with the compressing or expanding of audio signals in the time domain without affecting the signals pitch or naturalness. Conversely, the frequency scale modification (FSM) of speech is concerned with altering the pitch and formants of a signal without changing the signal duration. This paper describes a hardware implemented and optimized...
متن کاملA mixed-excitation frequency domain model for time-scale pitch-scale modification of speech
This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation,...
متن کاملImproving Voice Outcomes After Injury to the Recurrent Laryngeal Nerve
Objectives: The present study aimed to determine the voice outcomes before and after the administration of voice therapy in patients who suffered an injury to the recurrent laryngeal nerve after undergoing thyroidectomy. Methods: The sample consisted of 26 patients (2 males and 24 females) aged between 18 and 80 years (m=55±12) who experienced injury to the recurrent laryngeal nerve fol...
متن کاملEpoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
Timeand pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient timeand pitch-scaling methodology based on the glottal closure inst...
متن کامل